A Similarity Measurement with Entropy-Based Weighting for Clustering Mixed Numerical and Categorical Datasets
نویسندگان
چکیده
Many mixed datasets with both numerical and categorical attributes have been collected in various fields, including medicine, biology, etc. Designing appropriate similarity measurements plays an important role clustering these datasets. traditional treat equally when measuring the similarity. However, different may contribute differently as amount of information they contained could vary a lot. In this paper, we propose measurement entropy-based weighting for The data are first transformed into by automatic categorization technique. Then, strategy is applied to denote importances attributes. We incorporate proposed iterative algorithm, extensive experiments show that algorithm outperforms OCIL K-Prototype methods 2.13% 4.28% improvements, respectively, terms accuracy on six from UCI.
منابع مشابه
Clustering the Mixed Numerical and Categorical Datasets using Similarity Weight and Filter Method
Clustering is a challenging task in data mining technique. The aim of clustering is to group the similar data into number of clusters. Various clustering algorithms have been developed to group data into clusters. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numerical data type...
متن کاملClustering the Mixed Numerical and Categorical Dataset using Similarity Weight and Filter Method
Clustering is a challenging task in data mining technique. The aim of clustering is to group the similar data into number of clusters. Various clustering algorithms have been developed to group data into clusters. However, these clustering algorithms work effectively either on pure numeric data or on pure categorical data, most of them perform poorly on mixed categorical and numerical data type...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملBi-level clustering of mixed categorical and numerical biomedical data
Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for 'Bi-Level Clustering of Mixed categorical and numerical data types'. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedi...
متن کاملHolo-Entropy Based Categorical Data Hierarchical Clustering
Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerica...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2021
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a14060184